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ABSTRACT 

enVir ° nmentS ’ ° ptimal deHberati on about what actions to perform is impossible In 
stead, it is sometimes necessary to trade potential decision quality for decision timeliness One 

nrovide^b T g th f trade -° ff is to endow intelligent agents with meta-level strategies that 
provide them guidance about when to reason (and what to reason about) and when to act We 

describe our investigations of a particular meta-level reasoning strategy filterina in which ™ JZ 

“T‘ S ** * adopted, and then filteAom 

would conflict with the successful completion of enisling goals [ 1 ], To investigate the utm.v oJ 
' rmgl W ” C °”. d " Ct ' d 1 sem * of “Periments using the Tileworld testbed [121* Previous eyperi 

'’"FT*? VerSi ° n ° f ' th ° demonstrate the 

d demonstrate some significant environmental influences on the value of filtering. g ’ 

INTRODUCTION 


mZ A^Pfeetions involve systems that are situated in dynamic environ- 

, ‘ 7 . 1 ^ 4 exam P les from aerospace, communications, medical, process control and 

^ol^Stc P ‘ im n de ? er “‘r abOUt WhM aCti °" S * 2 impossibl ,n 

option are subject to change during the deliberation. A system that blindly pushes forw^d wto 
.ts original dehberation process, without regard to the amount of time it is t^ng L .h^ha^ 
meanwhile going on, is not likely to make rational decisions about what to do It hftb h ? 
necessary to trade potential decision quality for decision timelines^, 4 ,0 13] ‘ mM 
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One approach is to endow in.eiligen. systems, or agents, with 

that -old coniiic. with the successful 

completion of existing goals. , ■ r experiments using a simple, 

To investigate the utility of filtering, ^ X experimental research 

abstract testbed: the Tileworld^ Our use esneciahv Section 5 2]. We first described the 

methodoiogy that we 'of enhancement, to the 

Tileworld several years ago [12]. Since the , __riments A simplified version of the 

::;^™ir^a.ted^ 

filtering. 


THE TILEWORLD TESTBED 


The Tileworld tes.hed is a too, that we developed 

^ent. in dynamic d °^ ibed the Tileworld several years ago [12]; since 

™S=ss=KrssrJK.Mr= 

Tileworld User s Guide [4]. . , f j pnvironment with an embedded 

The Tileworld consists of an ah. street . dynamic ffilensionalgrid, deliv- 

agent. It is built around the ldea of The environment is dynamic; during the course of 

ering them to “holes", and avoiding obst . fied by thc researcher. The Tileworld is 

a simulation objects appear W enviromnent. In keeping the environment divorced 

tr^'p“"ppuc^ion, g our goal has been to ^ovide a too. throws • — 
cemed with any application to focus on wha, y ™s.der [“‘"'mnen, itself We have, 

environment, without the conform ng e ec s o ’ sufficient control to allow for system- 

in other words, traded realism-in the short run, at made ta several other 

atic experimentation. This metho o ogica ^'[he independently developed NASA Tileworld [8] 

Liso ° rga,u2ed ar °“ ,d the theme ° f agents sit,iatei 
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gridS ’ pm ^” 8 tiles - See Hanks - PoUack ' and Co**'" [3] for a discussion of the 
ethodological issues surrounding the use of simplified testbeds. 

A researcher using the Tileworld can manipulate and monitor characteristics of the simulated 
environment (such as how quickly it changes) and of the embedded agent (such as what kind of 
meta-level reasoning principle, i. employs). These characteristics can be defined AWtat^t"ve^ 

h u :xxet"“ e ' or by ,torins ~ -**■ «- *- « *»«■ -sss 

to t lT na “ y deve,oplng Tileworld, we adopted a minimalist philosophy: our policy was 
to keep the envtromnent as abstract and simple as possible, in order to provide the exLtoiente 

.TdlT “" ro T ‘ he *” d *° ««re that the system's perforZc“ no" 

- t . h ' f '" 1 " 5 of *"T glven domain. Each of the parameters in the original TUeworld was 
reduced because it represented an abstraction of what we believed to be a potentially important 

“r “™"7 Characleris,ic - Th ™- original Tileworld aUowed us to maculate 

, a "7^ r ° f . characteristics, including the degree of dynamism in the enviroiLnt 
the degree of uiufornuty of task difficulty, and the degree of uniformity of task reward. 

Our early experiences with the Tileworld led us to conclude that, while this was a tmod s*t of 
pariuneters with which to begin, some extensions were necessary to support the range of experiments 

iTveE hoS I "/* rtiCu1 "' !» original system, agents had only a stogie ""p- 
level goal, hole-fflhng, and no matter how they achieved such a goal, they were always awarded the 

enWroi^Tnl 1 * ' 6 '' ^ T7 “ !0C,ated with the hole ™ q"estion). This made the original Tileworld 

“rtudTthetrade off 7 ““ ”1 ab ° ut which “borate, and it was thus difficult 

to study the trade-offs involved in extra deliberation. We thus extended the system in several ways: 

‘ Iltae reqi,irement thal “gootr maintain fuel level: we can thus now study goal, of 

' Jrmor! e fTw l °, mai "7!; ^ fUd leVd5 ’ WS added a ■*“ stat '°n” where they can go to 
f h ” ft , We a “ adde< * * topdevel g~> °f building stockpiles of tiles having particular 
shapes at strategic locations on the grid. Thus, where for the original Tileworld agent all 

topdevel SL"" ° f 1 ‘ he S “ e lyP ' a h ° le >' in tha "rsion there are several different 

s We assigned “shapes” to tiles and holes, and changed the reward structure associated with 
successfully fffling a hole. The agent may fill a hole with an, tiles, but it gets more nil i 
t use, tiles whose shape, match the shape associated with the hole. As a result, there i, now 

1 CoL°’ aC aCdV ln 7 tigati ,” g tr,de ' 0fr! be ‘" ,ee ” Va ‘ U ' of ol'omative plans to achieve 

goal An additional complication is that the agent can carry more than one tile-in the 

StTSt “ y 7 had a ‘bo-ba. the more tiles it carries, the more rapidly it bn™ 

fcd. Again, this means that the quality of alternative alternative solutions to some goal may 
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. to our original minimalist philosophy, we only 

In implementing these extensions, pmeriments of interest to us. However, 

EXPERIMENTAL RESULTS 

We now present the experiments **££*£ 

properties of Altering ^as »»£***» f Motivation or details of the mechanism for filtering 
space Hmitaimn. we £»**•£* ^ predicated on the earlier work on IRMA, was 

here, but see [1, 10, nj* ceuLi ^ ^ tn nT . p J s ^laxis can result in overall 

that that, in a dynamic : enviro. '^“^n tendmic ^ beh , yior wiu sometime. be suboptimah 

improved performance, despite j.. , Georeeff using a simplified version of 

successful in this, like Kmnyan 6 ^ found that the in fl ue nce of commitment is bounded, 

: e j«, , df 

^ - « - - - 

" » experiment nsed a factorial ^Uo^ctors: degree 

which we had 14 levels, and ^ e ^° i , ““ ! t ^ ° the mo st committed agent seldom reconsidered 
ment” refers to the strength of the filtering strategy. committed agent always inter- 

its options until it had completed its current plan, whrle “^tolent, “Degree of 

rupted its actions to weigh the ° f environment: how frequently, on average, 

dynamism” refers to the average * rameter w „ effectiveness, which is a normalised 

do exogenous ™ ere were a told of 51 trials conducted pe, experimental condition, 

“hflUth Of each trial was 8 0,000 clock ticka (A ITperfo^l 
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*7* com Parison of the effectiveness of the most and least committed agents, shown in Figure 1 
{When all fourteen levels of committment are plotted, there are some line crossing, but the trend 
relating effectiveness and degree of commitment is still clear; see [11].) 

Table 1 summarizes the significance of the difference in performance between the most com- 
mitted agent we ran and the least committed one. It shows that the difference between their per- 
formance, although not enormous, is statistically significant everywhere except at the endpoints. 
Further analysis reveals the reason for the collapse at the endpoints. In the slowest environment 
we studied, there was a great deal of variation in the agent’s performance, because it was possible 
for the environment sometimes to evolve in a way that enabled the agent to succeed at all the tasks 
it w presented. Because of the high degree of variation in the scores, there was no statistical 
significance between the agents’ performance in these slowly changing environments. At the other 
endpoint— the most quickly changing environment— the situation is different. In this environment 
there was very little variation in the scores: both agents scored very poorly, because they were 
unable to succeed at all but a few of the tasks they were presented. This bottoming effect resulted 
m a lack of significance between the agents’ scores in this environment. 

Figure 2 plots the difference in these two agents’ performance. The graph shows that the value 
of commitment, while always positive, is a function of degree of dynamism in the environment As 
dynamism increases, the marginal value of commitment first increases, then peaks, and subsequently 
drops off, although it does not become negative within the bounds of the experiment. This result 
can be explained as follows. In slower worlds, there are fewer options presented to the agent 
and, hence, fewer opportunities for filtering to result in a savings in reasoning cost. Moreover the 
advantages of reducing reasoning are minimal, since there is generally enough time to deal with 
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T-Test Results: 

Significance of Difference between Mean Effectiveness 


Dynamism t 

Significance 

i 

1.121692 

P < .15 

3 

3.129425 

P < .0025 

5 

3.148238 

P < .0025 

7 

4.130018 

P < .0005 

10 

5.727610 

P < .0005 

15 

6.686329 

P < .0005 

20 

7.000076 

P < .0005 

30 

4.909135 

P < .0005 

40 

3.164884 

P < .0005 

50 

2.260098 

P < .02 

60 

0.709967 

P < .25 

Table 1: Analysis 

of Value of Commitment 


options As the world becomes more dynamic, there are more options for consideration, and the 
penalty for extra reasoning increases, because there is less time to respond to those options. This 
explains why filtering increasingly pays as dynamism increases. However another influence comes 
Jo play as the rate of change in the environment increases: the missed-opporturuty cost grows. 
As the world changes more rapidly, it becomes increasingly important for the agent to succeed at 
each individual task, since it will fail to complete a larger proportion of the potential tasks 
Ihape of the graph in Figure 2 is thus explained by the tension between the increased benefits of 
reduced^ reas(ming and the increased penalties of missed opportunity both of which vary directly 
with rate of change in the world. We expect to see a similar pattern of competing influences on th 
usefulness of filtering in other domains, and we will pay particular attention to the shape and peak 
of of the filtering-value curve in other domains, as it reveals useful information about the relative 
significance of reasoning overhead and missed-opportumty costs. 

CONCLUSION 

We provided a brief description of a set of experiments aimed at assessing the value of a strategy that 
may be incorporated in intelligent agents to help focus their reasoning in dynamic environments^ 
The strategy, filtering, involves screening from consideration options for action that are incompatible 
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Figure 2: Difference in Effectiveness between Most and Least Committed Agents 


with already established plans, except where those options are prima facie important enough to 
trigger a pre-defined override. We relied on a testbed system, the Tileworld, to conduct our 
experiments. We have made a number of enhancements to the Tileworld since the time it was 
originally developed, and we described some of the more important of those here. Our experiments 
demonstrate filtering is a feasible strategy, at least within the Tileworld, a result that suggests to 
us that it is worth investigating this strategy in more-complex systems. Additionally, our results 
showed an interesting relationship between the rate of change in the environment and the amount 
of benefit that one can derive from using a filtering strategy. 
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